﻿This folder contains complete replication material for: Bølstad, Jørgen, 'Capturing Rationalization Bias and Differential Item Functioning: A Unified Bayesian Scaling Approach', Political Analysis.

GUIDE FOR FUTURE USE
For researchers interested in using the models presented in the article, the file 'A_brief_guide_to_BAM2_and_ISR.html' provides a discussion of relevant functions, as well as examples of how to use them. 

COMPUTATIONAL REQUIREMENTS
The replication code requires a processor with at least 3 cores, as it is set to run 3 HMC chains in parallel. It also requires between 750MB and 800MB of hard drive space. Otherwise, there are no specific requirements, but the simulations will run faster on a computer with fast cores and as much cache as possible. 

SESSION INFO
The code was originally run on a system with the following characteristics.

Hardware:
Intel i5-6500 processor (4 cores, 3.20Ghz, 6MB cache) and 8GB RAM. 

Software:
R version 3.5.1 (2018-07-02) -- "Feather Spray"
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
 
Attached base packages:
tools     stats     graphics  grDevices utils     datasets  methods  base     

Other attached packages:
xtable_1.8-2       mgcv_1.8-24        nlme_3.1-137       basicspace_0.20    rstan_2.18.2       StanHeaders_2.18.0 ggplot2_3.0.0     

Loaded via a namespace (and not attached):
Rcpp_0.12.18       pillar_1.3.0       compiler_3.5.1     plyr_1.8.4         bindr_0.1.1        prettyunits_1.0.2  pkgbuild_1.0.2     tibble_1.4.2       gtable_0.2.0       lattice_0.20-35    pkgconfig_2.0.1    rlang_0.2.1        Matrix_1.2-14      cli_1.0.0          parallel_3.5.1     loo_2.0.0          bindrcpp_0.2.2     gridExtra_2.3      withr_2.1.2        dplyr_0.7.6        stats4_3.5.1       grid_3.5.1         tidyselect_0.2.4   glue_1.3.0         inline_0.3.15      R6_2.2.2           processx_3.1.0     callr_2.0.4        purrr_0.2.5        magrittr_1.5       scales_1.0.0       matrixStats_0.54.0 assertthat_0.2.0   colorspace_1.3-2   lazyeval_0.2.1     munsell_0.5.0      crayon_1.3.4     

REPLICATION
To use the replication files, you can either start R in the replication folder or set this folder as the working directory, using setwd(). The file 'replication_master.R' will run all the necessary files in the correct order, and reproduce all tables and figures both in the article and in the supplementary material. 

The specific files to replicate the results are in the folder called 'scripts'. These are numbered from 00 to 03, with letters added to designate smaller parts of each analysis. The roles of the numbered files are as follows (with approximate running times on the above-mentioned system in parentheses): 

- The 00-file should be run ahead of all other files, as it compiles the models and stores them. (5 minutes)
- The 01-files run the simulations and plot the results. (6-7 days)
- The 02-files reproduce all results reported for the UK. (1 hour)
- The 03-files reproduce the results for 14 additional countries. (5 hours)

LIMITS TO EXACT REPLICATION
These replication files set seeds both for the generation of starting values in R and for the random number generator used by rstan. This is sufficient to get the exact same results in repeated runs on the same system, but it does not ensure exact replication across systems or software versions. The Stan Reference Manual (available at https://mc-stan.org/docs/2_18/reference-manual/reproducibility-chapter.html) explains why:

“Floating point operations on modern computers are notoriously difficult to replicate because the fundamental arithmetic operations, right down to the IEEE 754 encoding level, are not fully specified. The primary problem is that the precision of operations varies across different hardware platforms and software implementations. ... Stan is designed to allow full reproducibility. However, this is only possible up to the external constraints imposed by floating point arithmetic.

Stan results will only be exactly reproducible if all of the following components are identical:
- Stan version
- Stan interface (RStan, PyStan, CmdStan) and version, plus version of interface language (R, Python, shell)
- versions of included libraries (Boost and Eigen)
- operating system version
- computer hardware including CPU, motherboard and memory
- C++ compiler, including version, compiler flags, and linked libraries
- same configuration of call to Stan, including random seed, chain ID, initialization and data”

In short, running these replication files on a different system will almost inevitably yield slightly different results from those reported in the article. It should be noted, however, that one is still sampling from the same posterior distribution, and the differences across systems are minor and substantively negligible. 

INCLUDED RESULTS
In addition to replication codes, this folder contains all results that do not require significant space. This includes summaries of the simulation results, as well as summaries of the results for 14 additional countries (reported as supplementary material). The results for the UK are not included, as the fitted models would require ca. 550MB of space, and the posterior predictions would take another 200MB, while these analyses are relatively quick to run. 

CONTACT INFO
For comments or questions, contact Jørgen Bølstad at j.boelstad@gmail.com
